Take-home Exercise 1: Application of Spatial Point Patterns Analysis to discover the geographical distribution of Grab hailing services in Singapore

Published

January 23, 2024

Modified

February 2, 2024

1.0 Introduction

1.1. Overview - Setting the Scene

Human mobility, the movement of human beings in space and time, reflects the spatial-temporal characteristics of human behavior. With the advancement Information and Communication Technologies (ICT) especially smart phone, a large volume of data related to human mobility have been collected. By using appropriate GIS analysis methods, these data are potentially useful in supporting smart city planning and management.

In Singapore, one of the important source of data related to human mobility is from Land Transport Authority (LTA) DataMall. Two data sets related to human mobility are provided by the portal, they are: Passenger Volume by Origin Destination Train Stations and Passenger Volume by Origin Destination Bus Stops. One of the limitation of these data sets is that their location are biased to either bus stops or MRT/LRT stations. In 2020, another very interesting human mobility data set called Grab Posisi was released by GRAB, one of the largest shared taxi operator in South-east Asia. There are two data sets been released and one of them is for Singapore.

1.2 Objectives

Geospatial analytics hold tremendous potential to address complex problems facing society.

In this study, we will be working on the following tasks:

  • Apply appropriate spatial point patterns analysis methods

  • Discover the geographical and spatial-temporal distribution of Grab hailing services locations in Singapore

1.3 Getting Started

In this take-home exercise, we will be using the following packages:

  • arrow

  • sf for handling geospatial data

  • tidyverse for performing data science tasks such as importing, wrangling and visualising data

pacman::p_load(arrow, here, lubridate, maptools, tidyverse, tmap, sf, sp, spatstat, spNetwork)

2.0 Data Acquisition

There will be 3 data sets used in this exercise:

Data Format Description Source
Grab Posisi Parquet Grab taxi location points either by origins or destinations Grab
Road Shapefile Road layer within Singapore excluding outer islands Geofabrik download server
Master Plan 2019 Subzone Boundary (No Sea) GeoJSON Singapore boundary layer excluding outer islands Data.gov.sg

2.1 Extracting Geospatial and Aspatial Data Sets

Start by creating a new folder labeled Take-home_Ex01. Within this folder, create a sub-folder named data. Inside the data sub-folder, create two additional sub-folders and rename them geospatial and aspatial respectively.

Unzip the malaysia-singapore-brunei-latest-free.shp.zip folder and place all files, MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson into geospatial sub-folder.

Place all files from GrabPosisi into aspatial sub-folder.

Tip

Did you observe that the file names from GrabPosisi and MasterPlan2019SubzoneBoundaryNoSeaGEOJSON.geojson are quite lengthy? Shortening them could make processing more convenient later on.

Alternatively, we can list.files() to get a list of filenames that contains .parquet extension.

3.0 Geospatial Data Wrangling

3.1 Data Aggregation: Importing and Combining Aspatial parquet files

# Use list.files to get a list of filenames that match the pattern
parquet_files <- list.files(path = "data/aspatial", pattern = "\\.parquet$", full.names = TRUE)

grab_data <- data.frame()

for (file_path in parquet_files) {
  grab_data <- bind_rows(grab_data, read_parquet(file_path))
}

3.2 Data Export: Writing DataFrame to RDS file

write_rds(grab_data, "data/rds/grab_data.rds")

3.3 Data Import

3.3.1 Importing Aspatial Data - Grab Posisi data in RDS format

grab_df <- read_rds("data/rds/grab_data.rds")
str(grab_df)
'data.frame':   30329685 obs. of  9 variables:
 $ trj_id       : chr  "70014" "73573" "75567" "1410" ...
 $ driving_mode : chr  "car" "car" "car" "car" ...
 $ osname       : chr  "android" "android" "android" "android" ...
 $ pingtimestamp: int  1554943236 1555582623 1555141026 1555731693 1555584497 1555395258 1554768955 1554783532 1554898418 1555593189 ...
 $ rawlat       : num  1.34 1.32 1.33 1.26 1.28 ...
 $ rawlng       : num  104 104 104 104 104 ...
 $ speed        : num  18.9 17.7 14 13 14.8 ...
 $ bearing      : int  248 44 34 181 93 73 82 321 324 31 ...
 $ accuracy     : num  3.9 4 3.9 4 3.9 ...

What we can observe is that the grab data contains 30329685 observations and 9 variables. Notice that pingtimestamp is in the wrong format. It should be in date/time format and not integer. We will need to convert the data type in the next section (Data Preparation).

3.3.2 Importing Geospatial Data - Road data in shapefile format

We can import geospatial data into RStudio using st_read() of sf package. Let’s try it now!

roads_sf <- st_read(dsn="data/geospatial",
                   layer="gis_osm_roads_free_1")
Reading layer `gis_osm_roads_free_1' from data source 
  `C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1759836 features and 10 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS:  WGS 84
str(roads_sf)
Classes 'sf' and 'data.frame':  1759836 obs. of  11 variables:
 $ osm_id  : chr  "4386520" "4578273" "4579495" "4579533" ...
 $ code    : int  5113 5114 5122 5122 5122 5122 5141 5122 5122 5122 ...
 $ fclass  : chr  "primary" "secondary" "residential" "residential" ...
 $ name    : chr  "Orchard Road" "Jalan Bukit Bintang" "Jalan Nagasari" "Persiaran Raja Chulan" ...
 $ ref     : chr  NA NA NA NA ...
 $ oneway  : chr  "F" "F" "B" "B" ...
 $ maxspeed: int  50 0 0 0 0 0 0 0 0 0 ...
 $ layer   : num  0 0 0 0 0 0 -1 0 0 0 ...
 $ bridge  : chr  "F" "F" "F" "F" ...
 $ tunnel  : chr  "F" "F" "F" "F" ...
 $ geometry:sfc_LINESTRING of length 1759836; first list element:  'XY' num [1:2, 1:2] 103.83 103.83 1.31 1.31
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:10] "osm_id" "code" "fclass" "name" ...

3.3.3 Importing Geospatial Data - Master Plan 2019 Subzone Boundary (No Sea) in shapefile format

mpsz_sf <- st_read(dsn = "data/geospatial", 
                layer = "MPSZ-2019")
Reading layer `MPSZ-2019' from data source 
  `C:\kt526\IS415-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
str(mpsz_sf)
Classes 'sf' and 'data.frame':  332 obs. of  7 variables:
 $ SUBZONE_N : chr  "MARINA EAST" "INSTITUTION HILL" "ROBERTSON QUAY" "JURONG ISLAND AND BUKOM" ...
 $ SUBZONE_C : chr  "MESZ01" "RVSZ05" "SRSZ01" "WISZ01" ...
 $ PLN_AREA_N: chr  "MARINA EAST" "RIVER VALLEY" "SINGAPORE RIVER" "WESTERN ISLANDS" ...
 $ PLN_AREA_C: chr  "ME" "RV" "SR" "WI" ...
 $ REGION_N  : chr  "CENTRAL REGION" "CENTRAL REGION" "CENTRAL REGION" "WEST REGION" ...
 $ REGION_C  : chr  "CR" "CR" "CR" "WR" ...
 $ geometry  :sfc_MULTIPOLYGON of length 332; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:300, 1:2] 104 104 104 104 104 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:6] "SUBZONE_N" "SUBZONE_C" "PLN_AREA_N" "PLN_AREA_C" ...

We can observe that both roads_sf and mpsz_sf are currently using the WGS 84 geographic coordinate system.

Summary Points

After looking at the file structure and contents of the 3 datasets, we need to perform the following preparations:

  • grab_df

    • Converting data type

    • Extracting trip starting location

    • Converting aspatial data into geospatial data

  • roads_sf and mpsz_sf

    • Performing projection transformation

    • Extracting study area

    • Getting road layer within Singapore excluding outer islands

3.4 Data Preparation

grab_df

3.4.1 Preparing and Converting grab_df into grab_sf (sf tibble data.framedata)

# Converting data type using as_datetime()
grab_df$pingtimestamp <- as_datetime(grab_df$pingtimestamp)

# Checking first n rows of data frame using head()
head(grab_df)

grab_origins_sf <- grab_df %>% 
  group_by(trj_id) %>% 
  arrange(pingtimestamp) %>% 
  filter(row_number()==1) %>%
  mutate(weekday = wday(pingtimestamp,
                        label=TRUE,
                        abbr=TRUE),
         start_hr = hour(pingtimestamp),
         day = mday(pingtimestamp)) %>%
  st_as_sf(coords = c("rawlng", "rawlat"),
           crs = 4326) %>%
  st_transform(crs = 3414)

# Getting geometry details using st_geometry()
st_geometry(grab_origins_sf)
1
Converting data type for pingtimestamp from integer into datetime using as_datetime() from lubridate package
2
By default, the head() returns the first 6 rows
3
Converting grab_df into sf tibble data.frame using st_as_sf() and its location information
4
Transforming coordinates using st_transform()
  trj_id driving_mode  osname       pingtimestamp   rawlat   rawlng    speed
1  70014          car android 2019-04-11 00:40:36 1.342326 103.8890 18.91000
2  73573          car android 2019-04-18 10:17:03 1.321781 103.8564 17.71908
3  75567          car android 2019-04-13 07:37:06 1.327088 103.8613 14.02155
4   1410          car android 2019-04-20 03:41:33 1.262482 103.8238 13.02652
5   4354          car android 2019-04-18 10:48:17 1.283799 103.8072 14.81294
6  32630          car android 2019-04-16 06:14:18 1.300330 103.9062 23.23818
  bearing accuracy
1     248      3.9
2      44      4.0
3      34      3.9
4     181      4.0
5      93      3.9
6      73      3.9
Geometry set for 28000 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3628.243 ymin: 25198.14 xmax: 49845.23 ymax: 49689.64
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

3.4.2 Creating ppp objects

grab_origins_ppp <- grab_origins_sf %>%
  as('Spatial') %>%
  as('SpatialPoints') %>%
  as('ppp')
summary(grab_origins_ppp)
Planar point pattern:  28000 points
Average intensity 2.473666e-05 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
                    (46220 x 24490 units)
Window area = 1131920000 square units
# Checking for duplicated points
any(duplicated(grab_origins_ppp))
[1] FALSE
plot(grab_origins_ppp)

3.4.3 Performing projection transformation

To perform projection transformation, we will st_transform(). Additionally, we will use st_geometry() to inspect the contents of roads_sf and mpsz_sf after the projection transformation.

roads_sf

# Transforming coordinates using st_transform()
roads_sf <- st_transform(roads_sf,
                           crs = 3414)

# Getting geometry details using st_geometry()
st_geometry(roads_sf)
Geometry set for 1759836 features 
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -434085.6 ymin: -23036.54 xmax: 1759022 ymax: 741993.8
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

mpsz_sf

# Transforming coordinates using st_transform()
mpsz_sf <- st_transform(mpsz_sf,
                          crs = 3414)

# Getting geometry details using st_geometry()
st_geometry(mpsz_sf)
Geometry set for 332 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

Now, we observe that both roads_sf and mpsz_sf are using the SVY21 projected coordinates system.

3.4.4 Extracting specific planning area - Downtown Core

Due to the intensive computational power required to conduct analysis on the whole of Singapore, we will be focusing on the Downtown Core planning area for this analysis as we want to investigate the spatial data point analysis in the Central Business District areas.

sg_downtown_sf <- mpsz_sf %>%
  filter(PLN_AREA_N == "DOWNTOWN CORE")
plot(sg_downtown_sf["PLN_AREA_N"], main = "DOWNTOWN CORE")

#sg_sf <- mpsz_sf %>%
#  st_union()

3.4.5 Creating owin object - Downtown Core

downtown_owin <- as.owin(sg_downtown_sf)
summary(downtown_owin)
Window: polygonal boundary
single connected closed polygon with 637 vertices
enclosing rectangle: [28896.26, 31980.09] x [27914.19, 31855.48] units
                     (3084 x 3941 units)
Window area = 5083640 square units
Fraction of frame area: 0.418
plot(downtown_owin)

3.4.6 Combining grab data points with study area - Downtown Core

summary(downtown_owin)
Window: polygonal boundary
single connected closed polygon with 637 vertices
enclosing rectangle: [28896.26, 31980.09] x [27914.19, 31855.48] units
                     (3084 x 3941 units)
Window area = 5083640 square units
Fraction of frame area: 0.418
grab_origins_dt_ppp <- grab_origins_ppp[downtown_owin]
plot(grab_origins_dt_ppp, main="DOWNTOWN CORE")

3.4.7 Getting grab points within Downtown Core

sg_downtown_grab_sf <- st_intersection(grab_origins_sf, sg_downtown_sf)

3.4.8 Getting road layer within Singapore excluding outer islands

In order to get the road layer within Singapore (excluding outer islands), we can use st_intersection() from sf package to confine the analysis to Singapore’s boundary (in specifics, confining to the Downtown core planning area).

sg_downtown_road_sf <- st_intersection(roads_sf, sg_downtown_sf)
sg_downtown_road_sf <- st_cast(sg_downtown_road_sf, "LINESTRING")

3.4.9 Visualizing the Geospatial data

tmap_mode('view')
tm_shape(sg_downtown_grab_sf) + 
  tm_dots()
tmap_mode('plot')

4.0 1st Order Spatial Point Pattern Analysis (SPPA)

4.1 Kernel Density Estimation (KDE)

kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                              kernel="gaussian")

plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")

grab_origins_dt_ppp.km <- rescale(grab_origins_dt_ppp, 1000, "km")
kde_grab_origins_dt_bw <- density(grab_origins_dt_ppp.km,
                              sigma=bw.diggle,
                              edge=TRUE,
                              kernel="gaussian")

plot(kde_grab_origins_dt_bw, main = "KDE Grab Origins for Downtown Core")

Working with different automatic bandwidth methods and kernels

par(mfrow=c(2,4))
# Using bw.diggle() bw
plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"), 
     main="Gaussian")

plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="epanechnikov"), 
     main="Epanechnikov")

plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="quartic"), 
     main="Quartic")

plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="disc"), 
     main="Disc")

# Using bw.ppl() bw
par(mfrow=c(2,2))

plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="gaussian"), 
     main="Gaussian")
plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="epanechnikov"), 
     main="Epanechnikov")
plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="quartic"), 
     main="Quartic")
plot(density(grab_origins_dt_ppp.km, 
             sigma=bw.ppl, 
             edge=TRUE, 
             kernel="disc"), 
     main="Disc")

4.2 Network Kernel Density Estimation (NKDE)

4.2.1 Visualizing the Geospatial data

tmap_mode('view')
tm_shape(sg_downtown_grab_sf) + 
  tm_dots() + 
  tm_shape(sg_downtown_road_sf) +
  tm_lines()
tmap_mode('plot')

4.2.2 Preparing the lixel objects

lixels <- lixelize_lines(sg_downtown_road_sf, 
                         700, 
                         mindist = 350)

4.2.3 Generating line centre points

samples <- lines_center(lixels)

4.2.4 Performing NetKDE

densities <- nkde(sg_downtown_road_sf, 
                  events = sg_downtown_grab_sf,
                  w = rep(1,nrow(sg_downtown_grab_sf)),
                  samples = samples,
                  kernel_name = "quartic",
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 5, #we aggregate events within a 5m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
samples$density <- densities
lixels$density <- densities

# rescaling to help the mapping
samples$density <- samples$density*1000
lixels$density <- lixels$density*1000
tmap_mode('view')
tm_shape(lixels)+
  tm_lines(col="density") +
  tm_shape(sg_downtown_grab_sf) +
  tm_dots()
tmap_mode('plot')